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Introduction 

This paper is the fourth in a series, of publications. The purpose 

of these papers is to provide supplementary reading for students of 

applied statistics. (See O'Brien, 1982a, 1982b, 1982c). My intended 

audience is social science graduate and advanced undergraduate students 

familiar with appl ied s tat is t ics . The minimum background for most of 

the existing and forthcoming papers is knowledge of applied statistics throup! 

rudimentary analysis. of variance, and multiple correlation and regression 

analysis. ; 

The unique feature of this set of papers is detailed proofs and 
derivations of important formulas and derivations which are not readilv 
available in textbooks, journal articles, and other similar sources. 
Each proof or derivation is presented in a clear, detailed and consistent 
fashion. When necessary, a review of. relevant algebra is provided. 
Calculus is not used or assumed. 

As a former instructor of applied statistics on the graduate 
level , I know that many' students are very capable of understanding the 
proofs and derivations presented in these papers. My experience has been 
that manv students desire to see a full, comprehensible statement of 
a mathematical argument. This series seeks to address such needs. 

The present paper is a companion work to an earlier paper (O'Brien, 
1982c) . Each is a derivation of the multiple correlation formula for 
the linear model. The first paper formulated a detailed derivation of the 
multiple correlation formula for standard (z) scores.^ The present paper ^ 
is a derivation of the multiple correlation -formula for uns tandardirred 
(raw) scores. Readers should find each paper interesting and informative. 

^Typographical errors appeared in this' paper. For the readers 
convenience corrections are summarized in Appendix B of the present paper. 
The author would be grateful if other errors in that paper or the 
present paper were communicated to him. 



The two papers taken together are meant to be preparatory reading 
for a related paper. * 



Overview of Derivation 



/ 



In this paper we will present a derivation of the linear multiple 
correlation formula for raw scores. The basic objective is to derive 
this formula for one raw score criterion (dependent variable) and 
any finite number of raw score predictors (independent variables). 

Let us first state the formula we will derive and introduce the 
notation used. The linear multiple correlation between one criterion 
and p predictors can be expressed as: 



.x,x,...,x i ,...,x 

12 j 



N 



b i r v1 S + b r S S + ... + 

1 yi y i ^ y- y 2 

b . r . S S + . . .+ b-r - - s S 

J yj y .1 p yp y P 



Writing the right hand side in summation notation: 



i 



X . x, , x 4 , — ,x . , . . . ,x 

L z 1 p 



r-b.r . 

h\ J : v :l 



y j 



where : 



1 2 j p 



12- 3 P 



multiple correlation of raw scores, 

the observed raw score criterion to be predicted, 
raw score predictors of the criterion, 



Forthcoming with the expected title: "A Derivation of the Unbiased 
Sample Standard Error of Estimate: the General Case." It will appear in ERIC . 



3. 



b j\h p) i ■ . . ,bj , . ■ . = slope coefficients or regression weights > 

r ,r , ...,r r = product moment cri /.erion/predic f ;or correlations, 

y 1 y2 v ) , . . . , vp 

S,,S„;...,S . .iS = standard deviations of the predictors', 

12 ) p 

S = the standard deviation of the criterion. 

y 

This is the formula that is derived in this paper. We will ^ 

first present a derivation for the simplest multivariate case: one- criterion 
and two predictors. A derivation is then presented for three predictors. 
The Hitter derivation is a useful exercise because it allows a review 
of the logic and procedures used in the derivation. In addition, it 
will motivate the use of summation when the algebra becomes complex. 
The derivation Ls then presented for the general case of p (finite) 
predictors. An integral part of this paper is' Appendix A. tn that 
appendix, a method is presented 'for finding the "normal equations" in 
regression analysis for raw score linear models. 

Prior to starting the derivation for two predictors, let us 
outline the plan which will be followed in the derivations. The steps 
will use are: 

1 . state the regression model 



we 
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derive the normal equations (see Appendix A) 



3'. define the multiple correlation 

4. apply rules of covariance and variance algebra 

to simplify the definitional form of the multiple 
correlation fo rmula ' 

5. substitute the normal equations into the multiple 
correlation fo rmula 

> 

6. simplify. 

We will refine these steps to suit a particular application. 
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Brief Overview of Regression Analysis and 
Derivation for Two Predictors 



In this section we will review the basic concepts, logic arid 
our notation for regression analysis. Introductory applied statistics 
textbooks can be consulted for more detailed information on regression 
analysis theory. See, for example, Lindeman , et_al., 1982. The intention 
in this section is to review* the rationale of regression analysis; 

The primary use of statistical regression analysis is controlled 
prediction and explanation of. quantitative .data. The basic principle 
that lay behind regression analysis involves selecting a general 
mathematical function that best matches the underlying form of 

variables over which one desires to exercise predictability. Assume one is attempting 

to predict one raw score criterion by use df two raw score predictors. 

Assume farther that the relationship between each predictor and the 

criterion is linear in form; The mathematical function most 

often selected to obtain the best linear "fit M for these conditions is 

provided by the following equation: 



Y = a + bjX. +. b 2 x 9 

v/here: 

a . . . . _____ . : 

Y - the predicted (not actual or observed) criterion, : 

a, b , b - constants, to be selected by the "least -squares'' procedure; 
12 a = the slope intercept, and b and b ? = slope coefficient 

terms , ^ 
: . : ,x = predictor variables in deviation score form. 
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it is conventional to express the predictor variables in deviation score 
form. That is, for each predictor, first find its mean and then subtract 
the mean from each predictor. For example, 

• X 2 = X 2 = *2 

Here, for either variable, "cap X ,! is the actual (or gross) raw score and 

X is its • arithmetic mean. It is not necessary for any mathematical 

reason to re-express the predictors in deviation score form. This is done 

simplv to force the algebra to be more tractable. As such, it is a matter 

i ____ _ ^ 

of convenience. Note that we do not re-express Y (or Y) as deviations. W e could 

re-express ' each tv P e of criterion. However, we have chosen not to 
do this since most authors follow this convention. 

Using deviation scores for the predictors, we can now write 

the two predictor raw score model as follows: 

Y - a +' b 1 (X 1 - + b 2 (X 2 - X 2 ) 

+ b l x { + b 2 x 2 

As stated, we will use the second form in this paper.. ^ 

The regression model stated above is an idealized mathematical model. 
If a variable set consisting of one criterion and two predictors can be 
assumed to be linear, then the model is a reasonable one to apply 
for prediction of actual or observed criterion scores. It is idealized 
in the sense that it assumes no error is made in the prediction of Y. -In 
practice, when an actual criterion score is compared to the criterion > , • 



Readers of the 1982c paper may wonder why on page 2 thereof the 
raw.-ec ore regression model was stated in terms of gross raw score (arid 
not deviation score) predictors. As stated, it is not .necessary math- 
ematically to re-express. In any case, the major result. we are seeking in 
this paper is unaffected by the initial form of the predictors.. The 
derivation could be made witnout the translation of predictors into 
deviation score form, but the result _wx>uld involve unne cessary and unwanted 
complexities. Practically speaking* this paper would have been very much 
longer if re-expression was not done. 

o 11 
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score generated by model, some error is likely to occur — the "fit" is 
lesj than perfect. If we call the actual sample raw score criterion Y, 
we can scate another model (an observed raw score model): 

A 

Y = 'Y + e 

where: 

e = the amount of numerical error resulting from using the 

y\ - - - 

idealized mathematical model (Y) to predict the actual 
criterion score (Y) . \ 

-That is, an actual criterion consists of a predicted quantity plus an 

error component. 

The error made in predicting the observed criterion score by the 
idealized mathematical model is: 

e = Y - Y 

This is the quantity we want to be as small as possible in order to 
minimize the error in prediction- It can be seen that, if e=0,the 

/v 

ictual criterion is perfectly predicted by .the idealized model (Y=Y) . 

The technique most often used in the social sciences to accomplish 
this goal is the "least squares" procedure. Essen tially , ; this p rocedure 
seeks to maximize predictability by minimizing prediction error. The least 
squares criterion or goal is summarized in the following expression: 




= a minimum 



If we substitute the quantity for Y previously defined, we can rewrite the \ > 

least squares criterion as: 



9 



ERLC 



1 I f it is understood that the summation limits range from the first 
observation (i=i) to the last (i=n) then we can drop the summation limits^ 
n refers to the total number" of observations for the :riterion_and_predic tors 
This sample size is the same regardless of the number of predictors in the 
regression model ._. Later in the paper when the algebra becomes more complex, 



we use summation limits extensively. 



Ik 



7. 



Y - (a + b 1 x l + b 2 x 2 rj " - £(Y = a = bjXj - b ^ V = £c 



2 _ <T_2 _ ; ; 

e = minimum 



(As an aside, "least squares 11 moans we determine values for a,b^ and b ? 

in Y such that the squared error term results in the least ' possib le value): 

No rmal F g nations 

Having stated the multiple regression model for two predictors, 
we now derive the so-called "normal equations". A discussion of the pro- 
cedures and results we will need is presented in Appendix A. The reader 
may wish to read Appendix A at this point (or take the next step 
on faith) . 

The normal equations are derived from the least squares criterion 
using calculus. The basic idea that lay behind the technique for 
two predictors is to generate an equation for each of the constants in 
the regression model (a,b^ and b 0 ). For the two predictor model, the normal 
equations tor a, b^ and b 0 , respectively are found to be: 

1 Y = na + h iI x L + ^TV 

W = a I*l + b 1 lA + . h 2pl X 2 



IV = a I X 2 + b lI X l X 2 + b 2l< 



Z 



In the first normal equation (for a), n is the sample size. 

These normal equations can be simplified by substituting various 
descriptive statistics into terms of the equations. Other terms will 
cancel in the process. For the readers convenience in following the 
substitutions, some basic formulas for sample descriptive statistics are 
presented in Table 1. 



la 
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Tabia 1 



Descriptive Sample Statistics 



Statistic 



Mean 



Variance 



Raw Score Form 



Standard Deviation S- 



n 



n-l 



n-l 



Deviation Score, Form 



same 



n-l 



n-l 



Correlation of 
Y and X, 



vl 



(n-l)S.S. 
1 2 



(n-DSjS-. 



= Ev y - 7) 



(n-l)SjS 2 



III 



(n-DS.^ 



I? 



(where y = Y-Y) 



(n-l)SjS 2 



^Jote: For "iiiean" It is understood that the summation extends across all n values' of Xj ( and f 
for "correlation"). This applies' equally to other statistics defined in the table. 
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In the first normal equation, we. recognize that, on the right hand side 

Pi T (x i " V ~ 0 

■ Y X 2 = I (X 2 - V = ° 

In the second normal equation, we can see that: 

l x i ~ 0 

c— ? r - * - — 2 2 

= / ~ j k ut tnt ^ sample variance , , is: 

£- (X l " X l ) or (n-l)Sj = £(X L - 



n-1 



This may be substituted for 

As for ^ x i X 9 ' we} c ^ n u ^ e tn ^ definition of the sample correlation 
between x^ and x ? to simplify this term. By definition, for samples: 

I (x rV (X 2 -V m I x i x 2 

(n-l)rj2^j S 2 = ^" X 1 X 2* This may be substituted. 
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JO. 

Finally, ^ :< i^ ma ^ ^e simplified as follows: / 
^XjY = VoCj-Xj)* . Now, 

^x^Y is identical to 1 (Xj-Xj) (Y-Y) or 

£*" x i^ • = ^ X \V (where y = Y-Y) . This is recognized to be the 



numerator of the correlation between x, and Y (r ■ or r, ). Hence, 

1 yl ly 

= Y x i Y 

Fyl (n-DSjS, ; ° r Z X 1 Y = (n - 1)r yl S v S r " ThiS 

may be substituted into the second normal equation. 



^ROOF: 



Now 



X (x r x i )Y = Z ( V " x i Y) = I x i Y -*& = Pi Y 

• Z^rV^ - ^ = £ (x i Y x i Y _ x i Y + x i Y ) ; 

= pit-Pi 7 - £x ]Y + £l Y 



■ = ^XjY - Y(nX x ) - X 1 (nY) + nXjY - £ J^Y - nX Y 
"Therefore, £(Xj-Xj)Y = £(XpXj) (Y-7) End of proof. 
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3. For the: equation ^ X 2 Y we can write dovm immediately the following 
simplifications: 

^x 9 Y translates into" ^ = ^ n_1 ^ r y 2 S y S 2 ' In addition '- 



12 v 12 1 2 

x- = (n-i)s; 



Making ail these substitutions, we arrive at a simplified set of the originally 
stated normal equations. 

£y na + bj(0) + b 2 (0) 

(n-l)r yl S y S 1 - a(Q) + b.(n-l)s| + b 2 (n " ! } r 1 2 S 1 S 2 

2 

(n ~ 1)r y2V'2 = a(0) + b l (n " 1)r i2 S l S 2 + b 2 (n-l)S 2 



To further simplify, eliminate zero terms, and for the last two. normal 
equations divide each term by (n-i). This gives us: 



na 



r ,.l S v S l = b l S l + b 2 r 12 S l S 2 



r y2 S y S 2 = b i r i2 S l S 2 * b 2 S 2 



is 
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As a "final simplification, we can divide through the first equation by n: 



Y a 

q q b.s; + . b r s s 

r . S S , = 11 21212 

y 1 y 1 



r y2 S y S 2 = >i r i*¥* + ^2 



There are the normal equations we want to work with in the derivation 
for two predictors. For the readers convenience in working through the 
derivation, we will restate them prior to the derivation* 



Multiple Correlation 

We are now ready to define the multiple correlation for one criterion 
and two predictors. By definition:* 

R Y.x 1 ,x 2 = corr(Y,Y) = corr(Y,a + + b^) 

cov(Y,Y) 



\Jvar(Y) var(?) 



cov(Y,a + + b 2 x 2 ) 



\Jvar(Y) ^varCa + b^ + t> 2 x 2 ) 



where : 



cdrr means correlation , 
cdv means covarianee and, 
var means variance. 



Alternative notation systems use R or , among others. 

1 1 2 12 
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It is important to remember that 
Elementary covariance and 
above correlation formula yield 



a, bj and function as constants, 
variance operations performed on the 
in the first step: 



x x - cov(Y,a) + cov(Y,b lXl ) + cov(Y,b 2 x 2 ) 



var(Y) 



M 



var(a) 4- var(b x^) 4- varCb^^x^) + 
2cov(a f bjkj) + 2cov(a,b 2 x ? ) 2cov(b ^x- ,t> 2 x 9 ) 



Applvinp rules of covariance and variance for variables and constants, 
we can achieve further simplification.* This is done on the next page. 



To briefly review: the variance of any constant is zero; the 
variance of a product term containing a constant yields the squared 
constant times the variance of the variables — for example j 

2 

var(bjXj) = bj var(Xj) 

When a covariance term contains constants, factor the constants outside 
the covariance operator (sometimes this reduces the covariauce to zero) — 
for example, 



but 



cov(a,b^x~) - ab^cov(i,x ) = 0 



cov(b 1 x 1 ,b 2 x 2 ) = bjb covtx^x^ 



By definition^ the covariance is related to the simple correlation— for exampl 

cov(x l' x 2 ) = r i2 S l S 2 
This should appear correct since, by definition. 



12 



cov(x,,xJ 
I z 



var(x ) var(x 9 ) 



14, 



0 + b^cbvCY,*^) + b 2 cov(Y,x ? ) 



S 

V 



\l 



... 2 ' " 2 ------ 

0 + b^varCxp + b 2 var(x 2 > + 

0 + 0 + 2b ^b^covCxj jX^) 



As mentioned, by definition: 



covCY.x^ = r yl S y S 1 
cov(Y,x 9 ) = r v? S v S 2 

cov( Xl ,x 2 ) = r 12 S 1 S 2 ,. \ s 

One further observation should be made with respect to the variance 
of the predictors. For example; the variance of is: 

varCxp = var(X 1 - 1^) ; 

/ 

By definition, the variance of this difference is: 



varCxp + var(X 1 ) - 2cov(X 1 ,X 1 ) 



Since is a constant, 

var(xi) = var(X ) +0-0 

Similar results obtain for var(x 2 >. There fore } when all substitutions are 
made : 

b r .S S. + b o r -S S 
r- _ 1 yl y 1 2 y2 y 2 

I • X . j I 



•l' X 2 



C 

y 



b l S l + b 2 S 2 + 2b lVl2 S l S 2 



This is the form of the multiple R we will use in the derivation- It will 
be restated for the readers convenience. 
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i 

Deriv atio n 

The following formula for one criterion and two predictors 
appears in manv applied statistics textbooks: 



R " x = V 1 yl y 1 2 y2 y 2 
K y* x 1 ' x 2 __- _. 

S . 

y 

We are now able to show its derivation. 

For the readers convenience, a restatement of the simplified 
set of normal equations and the multiple R formula is given in Table 2. 

The derivation involves two steps: a)substitute the normal equations 
into the numerator of the multiple R formula and b)simplify algebraically. 

See the page following Table 2. 
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Table 2 



Normal Equations and MuUbjA Co rrelation Formula for Two Raw Score Predictors 
N o-rmal Equations ^ 



r :S.S: 
yl y 1 



r- 9 S S 
y2 y 2 



Vl 



Vl2 S l S 2 



+ b 2 r 12 S t S 2 



Multiple Correlilt-Uw 



"• lt l' X 2 



VylVl + . b 2^y2 S y S 2 



bh 2 + bis: + 2b t b r 10 S,S 



11 2 2 



12 12 1 2 



1 . .. - term is omitted because it plays no role in the derivation (other than zero), 
TIig ci Y i 



NOTE: Proof involves the substitution of the normal equations into the numerator of 
the multiple R formula and simplifying. See text for details. 
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17. 



Notice that the numerator of the multiple R formula contains 

the terms r. ,S S ■ and r 0 S S;_ . These terms are functionally related 

y) y 1 y2 y S y\ 

to the normal equations. I f we substitute ndrmal equations for 

each term into R and rearrange terms, we obtain the following results^ 



R Y-x 1 ,x 2 



b i (b i s i •• V12W •• b :? (b ri2 s 1 s 2 + b.s 2 ) 



ysl b\s\ + b 2 S 2 + 2b 1 b 2 r 12 S 1 S 2 



b H + jPrVi2 S i S 2 * b lVl2 S l S 2 + b 2 S 2 



b l S l + b 2 S 2 + 2b l b 2 r !2 S l S 2 



b l S l + b 2 2S 2 2 + 2b l b 2 r i2 S l ? 2 



f 



1 S 1 + b 2 S l * 2b l b 2 r !2 S l S 2 



(Hence , 



br ,S S_ + b_r _S S_ 
1 yl y 2 2 y2 y 2 



b l S l b 2 S 2 2b l b 2 r i2 S l S 2 } 



Now 



, the bracketed term of the denominator can be simplified algebraically if 



we remember radicals and lavs of exponents. 



. 1 



l Let the denominator (inside the brackets) be called A. Thus, the 
structure of the Multiple R is: 

r 

R = 



J __ _ _ \ , 

keeall the following permissible operation (rationalizing the denominator/; 



A JA 



sy 



A V A 
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S i mp 11 fylng: 



9 9 

b i s i 



+ 



9 9 

b 2 S 2 



+ 



2b l b 2 r i2 S l S 2- 



Therefore , 



V X 1' X 2 



b_r S-S t + b.r _S S 0 
j 1 yl y 1 2 y2 y 2 



END OF PROOF 



^For readers familiar with the 1982c paper, it is possible to 
obtain a "cheap" pro*of in the analogous standard score regression model. If variables 
are in standard score form, then the standard deviations become unity; 



-3 = S 



= 1. Thus, in the notation of 



J 9 



the 1982c paper , 



z : 2- , z " 

y 1 2 



1 yl 2 y2 



S 
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Derivation for Three Predictors 



Let us -now work <out the derivation For a three predictor 
raw score linear regression model. This will allow us to review 
the logic and procedures of the derivation. We will ajLsd introduce 
the li^c of summation " which ..becomes necessary for the general case of 
p predictors. 

The first step is to state the regression model. For three predictor 
Y = a + bjXj + b 2 x, + b 3 x 3 



We have simply added an independent variable to our prediction (idealized) 

- - 

mathematical model to form a Four dimensional ho del (Y and three ■. 
predictors with their associated slope terms') . 

As ■ in the two predictor model, we make use of the least squares • 
criterion to establish oar £oal of minimizing the prediction error: 

^ ? -r— _ 2 2 - - 

" T 1 1 , x <- x e = a minimum 



"^ )2 = Z (Y ~ a " b l X l " b 2 X 2 " b 3 X 3 )2 " ^ 



The next step is the application of partial differentiation to find 
derivatives of each of the terms in the prediction model (a^b^b^ and b^)« 

This procedure produces the set of normal equations. Appendix A shows 
the procedures involved. Omitting the cumbersome algebra involved 
in simplifying the original set of normal equations, we can state the 
firrai and simplified set of normal equations as follows: 
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Y 



a 



r S.S; 
yl y 1 



+ 




+ 







+ 






+ 




+ 




Recall that the value of a is determined in practice but it plays no role in the 
derivation since it "drops out" in covariance and variance operations of the multiple 
R derivation. * \ 

The above normal equations are the ones we will make use of in the derivation 
of" the multiple R formula for three predictors. A restatement of them is presented 
in Table 3 for easy reference; 

' • The third step is to define the multiple correlation of one crite :on 
and three raw score predictors: Rules of covariance and variance algebra will allow 
us to simplify the definitional form of "R. . 
The multiple R is defined on the following P^gei 

l The term a is omitted . For justification, the reader rot want to include It in 
the definition of R and ascertain the result. 
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*Y.x.,x () ,x„ = corr(YjY) = corrCY^Xj + b^ + b^j 



A 

cov(Y,Y) 



\var(Y) var(Y) 



co v ( Y , bj icj + b^ + bjxj 



vnr(Y) 



var^Xj + b + b^j 



All of the above fonns state equivalent ways to define the multiple R. The last is amenable 
to operations of eovariance and variance. Applying rules of covariance and variance algebra: 

Cov(Y,b 1 x 1 ) + cov(Y,b 2 x 2 ) + cov(Y,b 3 x 3 ) 

^'•Y-Y x 3 = — 



var(b:x.) + vartb^) + var^x^) + 



2cov(b 1 x 1 ,b 2 x, ) ) + 2cov(bjX|,b x j + ^covfbgX^b^j 



b r S S. + b,,r 0 S S + b r S S 
1 yljH 2 yz y 2 3-yj-y— 3- 



\ 



-2 2 -2 2 2 J - 

bjSj + b 2 s 2 + b 3 s 3 + 



2S l b 2 r l2 S l S ? * 2b l b 3 r l3 S l S 3 + 2b 2 b 3 r 23 S 2 S ? 



This is as" far as we can simplify the multiple R at this point. We will retain this for 
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Table 3 

... _ ..... . .. ... 4 

Normal- -Eq-ua t-i-t >m> -and -Multiple Co rrelation Formula for Three Raw Score Predictors 



Normal Equations 



1 



r ,S S. 

yi y 



+ b 2 r !2 S l S 2 



r ,S S- 

y2 y 2 



b i r !2 S l S 2 + 



¥2 



♦ b 3 r 23 S 2 S 3 



r ,S.S, 
yJ y J 



b . r i3 S l S 3 + 



b 2 r, 3 S 2 S 3 



+ b, 



Multiple Correlation 



b.r.S.S + b,r S S + b,r S S 

- -4 -y-L-y-4 — 2 -y2-y-2 4-y4 -y- -3 



\ 



2 rt 2 . ,2-2 - ,2-2 ... 
bjSj + b 2 S 2 + b 3 S 3 + 



2b 



I b 2 r 1 ,S 1 S 2 + 2b i b 3 r 13 s i s 3 + 2b 2 b 3 r 23 S 2 S 3 



^Again^we note that the term a (=Y) is omitted from normal equations and the multiple R. 
NOTE: pertvation involves substituting the normal equations into the multiple R and simplifying. See 
the text for details. 
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We have stated the multiple regression model and least squares criterion; and presented 

the normal equations arid the multiple R formula. The fourth step is to substitute 

the normal equations into the multiple R. / 

If we substitute each of the normal equations for appropriate terms iifTlieSiumar ; a'^r 

/ 

of R wo obtain (see Table 3): 



cov(Y,Y) = b:r -S..S: + b 0 r J S, + b,r _S ..S- 
1 ,yl y 1 2 y2 y 2 3 y3 y 3 



VVl + b 2 r !2 S l S 2 + b 3 r 13 S l S 3 ) + b 2 (b l r !2 S l S 2 + b 2^2 + b 3 r 2 3 S 2 S 3 ) 



+ b 3 (b 1 b3- i3 S j S 3 + b 2 b3r 23 S,S 3 + b 3 S 2 3 ) 



°'l S l + ¥2 r i2 S l S 2 + b l B 3 r !3 S l S 3 ) + (b l 6 2 r l2 S l S 2 + b 2 S 2 + b 2 b 3 r 23 S 2 S 3 > 

+ (b 1 b 3 r 13 S 1 S 3 f b 2 b 3 r 23 S 2 S 3 + b$) 



Now, let; us write each parenthesised terra on a separate line t0 f orm a covariance matrix: 
cov(Y,Y) = l^Sj + b i b 2 r l2 S l § 2 + WnVl 

h l b 2 r i2 S l S 2 + b 2 S 2 + b 2 b 3 r 23 S 2 S 3 

b l h 3 r !3 S l S 3 + h 2 h 3 r 23 S 2 S 3 + b 3 S 3 



3U 
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At tliis point we will introduce summation to simplify the algebra; Consider the three 
squared terms along the northwest to southeast diagonal of the covariance matrix. It is clear 

that we might express thesis terms in summation as follows: 

3 : 



9 2 2 '> 2 2 

"I s , * * b 3 s 3 • 



Ib. 2 s? 



The remaining six terms in the matrix consist of three pairs of quantities: 
2b 1 ly,,S 1 S 2 ♦ ^r^S- + 2b,b 3 r :3 S 2 S 3 

One common wav to express this in summation is as follows:* 

3 2 

.2(b l b 2 r 12 S 1 S 2 + b 1 b 3 r 13 S 1 S 3 ,h 2 b 3 r 23 S 2 S 3 ) = 2 J f^YijVj 



^One of several forms often seen in multivariate statistics textbooks is as follows: 



2 7 b.b.r..S.S. 
i<j 1J1J1J 



3i 
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The total number of terms to be summed is determined by multiplying 

the upper limits (3x2=6). In the double summation operation, the inside 

summation operator is set to i; then -incremen t the outer operator 

(j = 2,3) giving i 3=12 + 13. Now increment i to 2 and complete the limits of j 

(with the side condition that i^j — e.g., i j=22 is not permitted). 

The subscripts that result from all of the summation operations are: 

12 + 13 + 23. Each value ,of course, is taken twice. 




f 



Thus, the nine covariance terms of the multiple R numerator can be written in all of 
the following ways: 

cov(Y.Y) . bft + bjiij + bg + 2b 1 b 2 r 12 S 1 S 2 + 2b I b 3 r 13 S 1 S 3 + 2b 2 b 3 r 23 S 2 S 3 

.3 _3_ 2 

) b 2 S 2 + 2) l_ b,b.r..S.S. 

•; 3- 



1 yl y 1 2 y2 y 2 3 y3 y 3 fc. j.yj y j 



Fhis last equation is simply a restatement of multiple R numerator from Table 3. The second 
aquation was just derived from the first equation. 

Turning to the denominator of the multiple R in Table 3, it is readily apparent that 
it is similar to the covariance term above. That is: 




var(Y) |var(Tj 



ar(Y)Jv 



-T 
$ 



b l S l + b 2 S 2 + b 3 S 3 + 26 l b 2 r l2 S l S 2 + ^iVnVs + 2b 2 b 3 r 2 3 S 



I 

j = l 



b 2 s? + 
J J 



-3- X- 



2 




h b-.r, .S S. 
b j ij y j 



j=2 i=i 
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tf we now form the ratio of covariance and variance terms for the multiple R, we 
can complete the derivation for three predictors: 



3 



3 2 



2„2 



;b~S~ + 2 / /b.b.r,,S S 

_ j=l J J j=2 i=i J J J 

Y,X 1' X 2' X 3 = ' " """ : 



\ 



3 



3 2 



2,-2 




b.S. + 2/ / b,b.r..S_.S. 



Notice that the name ra ton and denominator (under the radical) are identical in form. If we 
make the same algebraic simplification we made for the two predictor derivation, we obtain: - 



1 '^2 '^3 



fb 2 s 2 



3 2 



+ 2 




b,b.r. .S.S. 
i J ij i J 



b.r .S S. 

ri a y-j y 3 



END OF PROOF 



O This completes the derivation for three predictors. We now derive the multiple R for any 
asaMsible (finite) number of predictors in the linear regression model; 34 
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Derivation for p Predictors 



The derivation of the multiple correlation formula for any number 
of predictors will be presented/as a generalization of the two and three 
predictor cases. A rigorous mathematical proof that the generalization 
holds for p predictors could be provided by "mathematical induction". Our 
approach in this section is a straightforward multivariate generalization. 

For reference, the following is a listing of the general 
steps for the p predictor variable case: 

; 1. state the regression model for p predictors 

2. derive the normal equations (see Appendix A) 

3. define the multiple R 

4 . substitute normal equations into numerator o t" R 

5. express the covariance term in summation 

6 . express the variance term in summation 

7. simplify 

The linear regression model is: 

Y = a + + b 2 x 2 + . . .+ b.x. + . . .+ b p x p 

The least squares criterion is: 

a .9 2 
^(Y - Y)~ = £. e = a minimum 

S -bstituting for Y: 

^V(Y - a - b 1 x 1 - b ? x ? -. . .-b^xj-. . .-b p x p ) 2 = £e 2 = a minimum 
Next we derive the normal equations. In unsimplified form we have: 

^ote that the nornai equations for terms £ , £ x 2 Y etc. are written 

such that the first subscript is alwayc less that the 'second one. Since these 
products are symmetric ( £ x^Y =£ Yx i etc.) this method simplifies the algebra. See 
Appendix A for more detail. 
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EY na + b. Ex, + b„ Ex„ + b. Ex. Ex. +...+ b Ex 

1 1 2 2 3 3 j j p p 

.... _ 2 - ■ ■ ' ... - - - - - 

Ex,Y = a Ex. + b, Ex, + . b„ Ex.x. + b. Ex,x„ +...+ b Ex,x. +...+ b Ex.x 
1 1 11 212 3 13 jlj pip 

2 

Ex.Y = a Ex. + b, Ex.x, + b. Ex. + b. Ex.x„ +...+ b. Ex x. +...+ b Ex.x 
2. 2 112 22 323 j 2 J p 2 p 

Ex„Y = a Ex. 4- b. Ex.x. + b 0 Ex.x. + b, £x« +...+ b, Ex.x. +. . .+ b Ex.x. 
3 3 113 223 33 j 3 j p 3 p 



Ex Y = a Ex + b Ex,x + b. Ex 0 x + b 0 Ex.x +...+ b gic;X. +...+ b. Ex 
P P llP 2 2 P 3 3 p .1 j p P P 



If we apply the same logic and make the same substitutions we made for 2 and 3 predictors, we 
obtain a simplified set of normal equations: 



r ylVi = Vl + b 2 r !2 S l S 2 + ¥l3 S l S 3 + '" + VljVj + '" + W^P > 
r y2 S y S 2 = WlH + ¥2 + ^23^3 + ' " + V^j + ' " " + ^P 

f y3 S y S 3 = b 1 ^ 1 3 S 1 S 3 + b 0 r.,,S o S o + b.S 2 . +...+ b.r,.S S +...+ b r. S.S 

2 23 2 3 3 3 j 3j 3 j p 3p 3 p 



2 

r -SS = b.r, S.S + b 0 r. S.S + b„r, S.S +...+ b.r, S.S +...+ 6S 

yp y p 1 ip 1 p 2 2p 2 p 3 3p 3 p j jp j p p p 



restatement of the normal equations is given in Table 4. 
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<> ■/ 30. 

- i 

Multiple- Correlation for p. £re<l I c tors and Derivation - 

We are now ready to derive the multiple correlation formula for p predictors. See Table $ 
for a statement of the definition i>f the multiple R. : 

v 

1 

The covariance term is: . 



cov(Y,a + b.x. + b^x„ + b,xJ+...+ b.x, +...+ b-x-) 



b.r..S S. + :, n r ,S S, + Lr.J.S. +...+ b.r ;S S. +...+ b r. S S 

1 yl y 1 2- y2 v 2 3 y3 y 3 j yj y ] p yp y p 



Now, substitute the normal equations (line for line— see Table 4): 

covitj) • h.(h,S; T b,r ]2 S lSj i..4 b.r lp S lSp ) I yb,.^ T ^ +...+ b .r^) +...♦ 

V :b i c i,. s iV b 2V2'V" J VF 2) - < , 

Multiply each of the b, terms inside the parentheses and write each parenthesized sum on a separate line 

... A. . 

cov(Y,Y) = b,r ,S S. + kj~<$ S, + b.r .S S„ +;,.+ b- r S S- 
I yl y 1 I y2 y 2 < 3 y3 y 3 p yp y p 

t =,b 2 r 1 ,5 1? , + V J r 1 .S 1 S 3+ ;;; + l» 1 y 1|! 8.S ? + 
b l b 2 r 12 S l S 2 + b 2 S 2 +b 2 b ] r 23 S 2V"' +b 2 b j r 2j S 2 S i + 

bb 3 r l3 S l S 3 + ' b 2 b 3 r 23¥ 3 + B 3 S 3 +; " + W> + " 



b b r-S-S- + M r 0 S 0 S + b 0 r r S.S +...+ b.b.r. S S + ...+ b S 
1 D ip ] p 2 p 2p 2 p -3 3p 3 p ] p jp j p P p 

For reasons presented earlier, the term a is omitted in the derivation. 
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Table 4 

Normal Equations and Multiple Correlation Formula for p Raw Score Predictors 



Normal Equations 



r S.S. 
yl y 1 

y2 y 2 



r ,S S 
y3 y 3 



Vi + Vl2 S l S 2 + b 3 r 13 S l S 3 +:; ' + Vn S l S j +; " + Vlp S l S p 



b l r l3 S l S 3 + b 2 r 23 S 2 S 3 + b 3 S 3 



+...+ b.r, ; S,S. +...+ b -r.-S.S- 
J 3j 3 j p 3p 3 p 



2 



r S S = b;r; S:S. + b r S„S + bx, S 0 S +...+ b.r. S.S +...+ 6S 
yp y P I Ip 1 p 2 2p 2 p 3 3p 3 p J JP ] P P p 



Multiple Correlation 



Vx, i lx,,,.,x ,x = corr(Y.Y) = corr(Y, b.x, + bX + b,x.. +...+b.x. +...+ b x ) 



VTT ' j 



11 u 2 2 3 3 



J J 



P P 



cov(Y.Y) 



cov(Y,b.x, + Lx n + b.x. +...+ b.x ,+...+b-x-) 
U U 3 3 • ,1,1 pp 



• ^var(Y) var(Y) 



var(b 1 x.+b.x n +...+b,x.+...+b. x ) 
112 2 j j p p 



1 



- b,r-,S S, + b„r 0 S S 0 +...+b.r .S S.+...+b r S S 

= 1 yl y 1 2 Y 2 y 2 :, yj y i OLLL 



s 



b 2 S^ + bV. +...+ b 2 S 2 + 2b,b.r. S.S i 2b.b,r n S.S, + •••+ 
11 22 pp 1 I u 1 2 1 j U 1 j > 



2b. b.r, .S.S. +...+ 2b ,b r . S ,S 
l j 13 i j p-1 p p-l,p p-1 p 



The a=Y terra is omitted from the normal equations and multiple R. 
NOTE: privation consists of substituting each r ,S S normal equation term into the 

yj y j 



covariance term of the multiple R. See text for details, 



To facilitate working with such a complex matrix, we will intro- 
duce summation at this point. As the first step, we count the 

total number of terms to be summed. An inspection of the covariance matrix (page 30) 
above makes it evident that each row consists of p terms. Since there 

is a total of p such rows, the entire covariance matrix consists of 

- -2 

p x p = p terms . For example , in the derivation for three 

predictors, we worked with three rows, each of which contained three 

2 

terms or a total of 3 x 3 =3"=9 terms- In the p predictor model, 

the covariance matrix consists of two kinds of terms: diagonal 

2 2 2 2 

terms (b.S. to b S" ) and off diagonal terms. It is evident that 

1 1 p p 

there are p such diagonal terms. A little algebra will tell us 
how many off diagonal terms are in the covariance matrix. Let X 
represent the total number of off diagonal terms. Then: 

■ ? • 

TOTAL MATRIX = p~ = p + X 

or , 

2 - 

X = p -p 

X = p(p-l) 

Thus, the entire covariance matrix consists of p diagonal terms and p(p-l) 

2 

off diagonal terms for a total of p terms. 

We can view the structure of the covariance matrix in another way. 
This view is the "trick" in understanding the expression of the matrix 
in summation notation. Notice that the off diagonal terms exhibit 
a pattern (as we saw in the two and three predictor cases) . Each 

S.b.r.iS.S. corresponds to one other term in the matrix that is identical 
"i j iJ i J 

to it. For example, the first off diagonal term in row one is ^j^2 r i2 S l S 2' 

and the first term in row two is identical to it. In general, { 

any off diagonal term in row i, column j is .identical to the term 

in row j, column i (e.g. , row 2, column 5 = row 5, column 2). Thus, 

the of; diagonal terms consist of a number of identical pairs of terms. 

There are p(p-l) such pairs of off diagonal terms. Suppose we 

2 "* -" --" 

halve the total p matrix and consider the upper half only that makes 

- - ---- - -2 2 

a right triangle. In this halved matrix, we are considering the p b^S^. 

diagonal terms and p(p-l)/2 off diagonal terms . That is, the upper 

triangle consists of_ + p(p-l) terms. To represent the entire covariance 
4 P 2 
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matrix (p terms), simply double the number of off diagonal terms 



in the half matrix: p" 



P ~+ 2 



= p + p(p-l) total terms, 



Examine trie matrix of covariance terms for the three predictor case 
for further clarification. 

a 2 
As explained, the cov(Y,Y) matrix consists of p x p = p terms; 

9 9 f 

there are p and p(p-i)b:b^r; ,§ T S_; J or 2 |p (p- 1) /2)| [ terms in 



/ij-Vj |" 2jp(i>-i)/25jJ 



the total matrix. 

Expressing the total number of diagonal terms in summation notation: 



9 9 9 2 - 9 9 2 2 

bTST + b;.< + ...+ bTS: +:::+ b^s 
11 2 2 J 3 P P 



P 9. 2 
£ bVst 
3 3 



The off diagonal terms can be expressed in summation notation as follows: 

2(b l b.r TO S 1 3. + .+b,l- r,-S,S) = 
1 2 12 1 2 z 0 JP ] P 



P P-i 



2 £ E b ,b ,r. ,S.S. 



*For those readers familiar with combinatorics, the following may assist 
in clarifying the logic. ^ 2 

There is a f.ctal of p b.S. which are combined with all such terms 

-33. 22 
■HQ at a time. In combinatorial notation, this means that p b.S^ terms are com- 
bined one at a time--that is: 



total number of 

b4s? terms 
3 3 



C) 



l!(p-l) ! 



p(p-i)(p-2)...i _ 
l(p-i)(p-2) ... I 



For the off diagonal terms, we construct 



Q ' 



terms (pairs 



of identical terms, each combined with all other like terms two at a tim<=0 
Thus : 



Total number of 

b .. b ; r ; S - S ; t e rms 
~ 1 r l 1 .1 




= 2 



2!(p-2)! 



~7 _ P 1 

0 i 



p(p=l)( P -2)(p-3) 



: 



= p(p-i) 



Hem:e , the entire covariance matrix consists of: 





2 (p-2)(p-3):: 



P + P(p-D = p terms 



For example, in the three predictor model, the first off diagonal 
term was seen to be t) ^r^S ^ , anc j j-he last was seen to be b 2 b 3 r 23 S 2 S 3' 

In "the case of a 10 predictor model, first and 
last terms, respectively^ would be: b i b 2 r i2 S l S ? 3nci ^9^1Q r 9 10^9^10" 

We can now express the full covariance matrix in summation 
notation as: 

P 9 p P P-l 

cov(Y,Y) = i bTS + 2 Z I b:b:r-:S-S 
j=l 3 3 j=2 1=1 1 J ij 1 J 

Equivalen tl y , 

A 

cov(Y,Y) = b.r _S S. 4- b 0 r 0 S S_ : + b .r S S. 
^ ' 1 yl y 1 2 y2 y 2 j yj y 3 



P 

r b.r .S S. 

j-i 1 yj y J 



Thus , 



A P 9 9 P P-l. P 

cov(Y,Y) = E. bj S| + _E 2i Z i b.b.r..S.S j = .^VyjV;] 



The latter equation is very important in the final steps: 

If the variance terms of the multiple R are examined, we see that 



^var(Y) is simply S y by definition. The term, J var(Y) can be manipulated 
by covariance and variance rules to produce the following (see Table 4) : 



\ 
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var(Y) = Jvar(b,x. + Lx, +...+ b-.Xi+...+ b-x-) 
112 2 j j p p 



•2-2 ' 2 2 2 2 2 2 
b S + b S +...+ b" S +...+ b S + 
11 2 2 j j p p 



2b,b r S S + 2b,b,r n S S + 2b 1 b.r w S,S. +...+ 2b,b,r i S i S,+...+ 2b b ,r- - .S-S- , 
1 2 12 1 2 1 3 13 1 3 1 4 14 1 4 l j ij i ] p p-1 p,p-l p p-1 



la summation notation: 



ivar(Y) 



i 



p 



I b;S 



. P P : l 

+ 2 1 Zb'b.r..S ; S 1 
j=2 i=l 1 J 1J 1 J 



THEREFORE, AFTER MUCH LABOR, WE CAN STATE THE MULTIPLE CORRELATION 



If Xj y X^ yXy . » » ,X , , . » » , 



J P 



P. 2. 2 
I b;S; 

Mil 



P P-1 
, - I l b.b r,.S,S. 
+ >2 i=l 1 J ±J 1 J 



p 



■2,2 



P P-1 



Z b.S. + 2 i i b;b.r.,S.S; 



j=l 



P 



2„2 



I b.S 



P P-1 

+ II I b:b,r,.S,S ( 

j-zi-i l ' ] yiJ 



s 



ft 
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I b.r ,S S, 

. j j yj y j 



END OF PROOF FOR p PREDICTORS 



f 



36. 



Appendix A 
Normal Equations in Regression Analysis 



Introduction 



In this appendix, we outline a set of procedures to apply in 
regression analysis for finding normal equations- The procedures 
are appropriate when: 

a) the regression model is linear , and 

b) the measures are in raw score. 

_ ... i . . . __. 

If variables are transformed to a nonlinear form prior to 

regression analysis procedures, the procedures described in this appendix 

would not apply. Examples of nonlinear transformations include 

logarithmic, exponential arid square root re-expression , or, in 

general, whenever the exponents of the variables in the regression model 

are not equal to unity. For example, 

* 2 
Y = a 4- b^x 1 + h 2*2 

This is a nonlinear mathematical model since the exponent of 
x^ is not equal to 1 . 

To derive normal equations for a given regression model re- 
quires knowledge of elementary differential calculus which makes 
use of partial differentiation.''" Students who are familiar with 
calculus may read any textbook of mathematical calculus for the 
details ( for example, Hoel, Port and Stone, 1971 ). 



For students who need to review this procedure, or who .know 
some calculus and want to learn the technique, see Goodman , 1977 * for 
a good introduction. 
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To render a conceptual understanding of normal equations as they are 
employed in the least squares procedure, let us take an example of a two 
predictor model. The mathematical "model applied to a distribution assumed 
linear in each predictor is the one given in the text, namely: 

A 

Y = a + b^x^ + ^>2 X 2 

The raw score model includes an error component, and the error 
made in prediction of the criterion (Y) with the above model may be 
negative, zero or positive. The raw score model is; 

Y = Y + e 

Solving for e, we obtain: 

Y - Y - e 



This represents the amount of numerical error made on a score-by-score 
basis when we predict Y with the idealized model, Y. To obtain 
an overall indication of the amount of prediction error for the 
entire raw score distribution, we might be tempted to define J 

A 

E (Y-Y) = Ee . (over all n observations) 
The problem with this approach is that the resulting sum on the left 

_______ , A 

side turns out to be exactly zero 1 ; E (Y-Y) Ee = 0. That is, 

positive errors cancel out negative errors leaving zero as the oveiall 
sum. This is obviously problematical because no matter how good or 
bad a particular mathematical model (linear or nonlinear) is for 
empirical score prediction, we would have no way of determining its 
utility (using the sensible criterion of minimizing prediction error) . 

1 Proof. For two predictors : 

Z(Y-a-b 1 x 1 -b 2 x 2 ) = E (Y-Y- b^-b^) 

= E (Y-Y) - b 1 Ex 1 - b 2 Ex 2 = 0 
The generalization of this for p predictors is obvious. - 
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For these reasons, the most widely used and accepted procedure 

for finding normal equations is based on the least squares criterion; i.e., 

a. a 2 ? 

v( Y - Y) = £(Y-a ~bj.x^ - b 2 * 2 ) * T.e^ = minimum 

(The summation ranges from i=l to i=n or, over the entire set of observations) . 
In words, least squares states: find numerical values for a, b^ and b^ 

which will make the prediction error the smallest possible numerical 
amount upon substitution. 

The reader is already aware of one least squares 

type of result from elementary statistics. A kind of least squares 
criterion (and procedure; is used in defining the sample variance 
of .a distribution; i.e., 

/ 

S Z = JL_ S(Y-Y) Z 
1 n-1 

The arithmetic mean, Y, is used in variance formulas (instead 
of medians or other numbers ) because the resulting variance is 
the smallest possible value when the mean is used rather than any 
other number (or combination of numbers) in that given distribution. 
This is derived through the same calculus procedure used in deriving' 
normal* equations, and is based on the same principle: optimization 
or minimization. 

* Take an example : 

Y ,(Y~2) 2 (Y~4) 2 (Y-8) 2 (Y-10) 2 (Y-ll) 2 (Y-7) 2 

2 

8 
10 
11 

Y = 7 

— 2 

Find each squared sum and compare it against (Y-Y) (The n-1 

can be ignored since it is a constant and has no material bearing on 
the result).- It will be seen that only 

(Y-7) 2 gives the smallest squared deviation sum. 
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Our task in regression analysis is to find numerical 
values corresponding to terms in the model to satisfy the least 
squares criterion of minimum error of prediction. The resulting 
values, when substituted into the regression equation, satisfies 
the criterion of minimization . In essence , we solve p+1 equations 
(p= the number of predictors , and 1 corresponds to the slope intercept 
term), or one equation for one term in the model. Each equation is 
zhen solved simultaneously to determine computing formulas to 
obtain the numerical values for the p+1 terms in the model. Finally, 
each predictor (and the slope intercept term) is passed through 
the resulting prediction equation to find a unique predicted criterion 
for each observation in the data set. The rest is statist cal theory 
(see Lindeman, et al. for an excellent discussion of regression theory). 

To take the two predictor example once again , 

2 2 - 

Z (Y - a - ^i x i ~^2 X 2^ £e = minimum. 

We are not interested in finding a cobnut at ionai formula for 
a, b^ and b^. Our goal is to stop one step short of 

doing that. We are interested in finding the normal equations, 
and simplifying them to substitute into the multiple R, 

Plan 

We will now set down a plan for finding the normal equations. 
A four phase plan i§ used throughout this appendix for finding 
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normal equations- This will heLp structure the presentation. 

A 

A. state the regression model, Y 

B . state the mathematical function of the 

least squares criterion, ^ « 

Z(Y-Y) 

C. derive the normal equations for each of 

terms in the model 

D . summarize the normal equations 

Finding Normal Enuations for the Two Predictor Model 

Let as apply the four phase plan first to the two predictor case. 
A . the regress ion f unc tion is 

A 

Y = a + b;x: + b.x_ 




B. the least squares criterion is 

A ? - 2-2 

Z(Y - Y) = Z(Y - a -b.x- - b^) = Ze 



the procedures for deriving the normal equations 
are : 

1, For the slope intercept term, a^ we need to: 

a) drop the exponent 2 and set functiofl equal to 6 

b) distribute the summation operator 

c) apply rules of summation for constants 

d) solve . in terms of the criterion variable, Y 

e) substitute descriptive statistics and simplify 

Applying each step in a) through e) produces: 





. E(Y 


- a - b x x 2 - b 2 x 2 ) 


= 0 




b) 




- Za Zb^ - 


Zb 2 X 2 = 


0 


c) 


ZY 


na b^Z x^ - 


b 9 Z x 0 


= 0 


d) 


ZY 


= na + b^Z x^ +, 


b 2 Z x 2 




e) 


ZY 


= na + b.CO) + 


b 2 (0) 





Recall- that Zx = Zx 2 = 0. Dividing through by ' . 

n gives us the normal equation for a^fin simplified form). 



a=Y 
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2. The procedure^ f or finding the normal equation for are: 

a) drop the exponent and set function equal to 6 . 

b) multiply the function by x 7 

c) dis tribute the x^ terra 

d) distribute the summation operator 

e) apply rules of summation for constants 

O solve in terms o f the cri te rion variable , Y 
g) substitute descriptive statistics and simplify 

Applying each step in turn produces: 

a) :: (Y - a- b Xj - b 9 x 2 ) = 0 

b) t (Y - a - b x - b 2 X 2^ x i 

2 

c) E(Yx^ - axj - bjXj - b^x-x^ ) 

- 2 

d) ZYxj - Eax- - T ' b l K l ~ Eb^x^ < 

. _ __ _ 2 

e) EYx^ - a Ex^ - b^ Ex^ - b 2 ^ x ^ x 2 

2 

f) ZYx^ = a Ex^ + b^ Ex^ 4- b 2 ^ x ^ x 2 
- since ZYx_ = (n-1) r_ -5 S- and 

2 2 - y _ _ 

EXj = (n-i)S and £x l x 2 = ^ n "^ r i2^1^2 ' 

we can substitute these quantities, and obtain: 

(n-l)r T S S = 0 +b 1 (n-l)sj + b 9 (n-l)r S S 



(recall that Ex =+Q) . 
If we divide the last equation by (n-1)^ we obtain: 



r yi S y S l = Vl + b 2 r l2 S l S 2 



■This is the normal equation in simplified f'orra 
we used in the derivation (see Table 2; . 
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The steps 'for finding the normal equation for 
parallel those for 1 

a) drop the exponent 2 and set function equal to 0 

b) multiply the function by x^ 

c) distribute the x^ te ™ 

d) distribute the summation operator 

e) apply rules of summation for constants 

f ) solve in terms of the criterion variable , Y 

g) substitute descriptive statistics and simplify 



Applying each step in order: 



a) 


E(Y -a 




" b 2 V = 


b) 


E(Y -a 


= b pi 


- bgX 2 )x 2 


c) 


E(Yx 2 - 


ax 

2 


bjXjXg - i 


d) 


£Y-2 - 

X 


Zay^ - 


- lb x^. ■ 


e) 


EYx 2 - 


aEx 2 


- ^E Vz 


f) 


EYx 2 = 


aEx^ 


- b 1 E> 1 x 2 



2 

'2*2 



b 2 Zx 2 
b 2 EX 2 

g) since EYx_ = (n-l)r .S S and 

° 2 yz y 2 

Ex i X 2 = r l2 S l S 2 and Ex 2 = (n " 1)S i 

we can substitute these quantities and obtain: 

(n-l)r y2 S y S 2 = 0 + b 1 (n-l)r 12 S 1 S 2 + b^n-l) 

If we divide through by (n-1) we have: 

r 0 S S 0 = "b-r- 9 S, «L + b^sl 
yz y 2 112 12 22 

This was the simplified form of the normal equation fo 
& 2 that was used in the derivation (see Table 2) # 
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D. We now recapitulate. As noted, a normal equation is 

derived at the point when we solve in terms of the criterion 
variable, Y* Subsequent steps are used to simplify. 

v The normal equations for a, b^ and b^ were: 

For' a: ^ ZY = na + °l Zx i + b 2^ X 2 ' 

. -. 2 

For b,: ZYx. = aZx, + b^x, + b o Ex.x 0 

I 1 1 l I Z L Z 

For h 2- ' 

D 2: ZYx- = aEx 0 + h-Ex"^ + .b~Ex._ 
2 2 1 i I jl. 



When we simplified the normal equations, we obtained the 

- - --- — - 1 

following set used in the derivation for two predictors. 



V ? y S i = b i S l + b 2 r l2 S l S 2 

r y2 S y S 2 = Vl2*l' S 2 + b 2 S 2 



Readers of the 1982c paper should recognize the remarkable 
similarity between raw score and standard score normal 
equations. ~If the above variables were standardized, 
each term S, =0 and a=0 making each normal equation set equal 



L 



We actually disregarded the term a in the derivation because 

it was seen to "drop out" when it was included in the. algebra. 

It is included here because the slope intercept term is included 

in the regression equation for criterion score calculation. Fne formula 

used is: . 

A _ - . 

Y = Y + b 1 x 1 + b £ x 2 

See Lindeman, et al; for additional methods of writing this 
equation. 
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Finding Normal Equations for p Predictors 




The rules and method for deriving a set of normal equations 
when the number of predictors is greater than two are 
generalizations for the two (or one) predictor case, 1 We will 
show two methods for the general case. The first method will 
use the four phase plan. The second is a short-cut technique. But 
the shorter method depends on first showing the longer one, 

\ 



H/hat are the normal equations for the one predictor model? 
The reader may find it instructive to derive the normal equations for 
this linear model. This can be done using the above procedures as guidelines, 



ANSWER: 

Y = a 

r :S.S: = b:S? 

;/l y 1 11 



The "multiple" R in this case is the simple Pearson product correlation, 

r , which is equal to L ; ' This is obtained from the second equation, 

s 

y • 

Thus, the regression (prediction) equation upon substitution is: 



Applying the four phase plan Rives the following results for the general case. 
A. 'Die regression model is: 



Y = a + b,x, + lix, +...+ b.x u +.;.+ i> x 
11 I 1 J ) P P 

B. The function to be minimized a ceo ring to the least squares criterion is: 



C. The procedures for finding the nonnal equations for a and any_ b. term are as follows: 
1. Iii deriving the normal equation for a, regardless of the number of predictors, 



2; Finding the normal equation for any bj term can bo done in seven steps: 

a) drop the exponent 2, and set the function equal to 0 

b) multiply the function by :•: 

c) distribute the x. term 

;] 

d) distribute the summation operator 

ej ipply rules of summation for constants 
) solve i: terms of the criterion variable, Y 
gj siihst i t • : o descriptive statistics and simplify 



v(Y-a-b i x 1 -b 2 x r ...-b.x r ...^x ) 



2 



the result is always the same— a = V . 
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Applying these steps in turn produces: 



a) 7.(Y - a - bjXj - b 2 x 2 -...-b j x.-...-b p x p ) -0 



\ - ! 



bj :(Y - a - bjSj - l),x 2 -...-b.x -:;;-6 x )xj ■ 0 
c) iYx, - aXj - bjXjY b 2 x, Xj -. . .- bj«j - _ _-b p Y p j =0 

<1) ffx. : fe. - iljXjXj - Eb 2 x 2 x. ijxj -.-Yj-p = ° 

e) CTxj- a» 5 - bjEit^ - b^x, B .j S | ... j, » t . 0 

2 

.1 j 1 1 j 2 2 j 3 J P .] P 

g) (n-l)r yj S y S. = 0 + ^IHI^S^ ♦ tynHJr^S. w bj (irl)sj W Wrtffi 



Dividing through by (n-1) 



2 

yfyj " VljT'j ' u 2'2 : f2l Tj pMp M P 



r S S = b,r,.S v S. + bX.SS. +...+b ; S: +...+ b r, S S 



Thus, the normal equations for any number of predictors in the regression model 
consist of a=Y and p normal equations of the general form r y j S y S j dcfincd above ' 
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Alternate Procedure 



The above normal equation for any b ; term (r-.S-S.) is a general result. Now a 

] yJ y j 



much simpler procedure which makes use of this fact will be presented. 



Recall that the simple correlation of any variable with itself is equal to h That is, 



r=r =,,,=r,,=...=r = 1. Also recall that the covariance of any variable with itself 
11 - JJ PP 

is equal to the variance of that variable; that is ccv(x.,x.) = S 2 , cov(x„,xJ '= S? 

1 1 j 2 2 2 



or, in general ; cov(x;,Xj) = S: . Another way to denote cov(xjjXj) is S ^ ^ ; in general 

2 2 
We can write i cov(x.,x, ) - S. or S ,= S, . 

J J J JJ J 

From these facts, it is possible to write, down ah entire set of normal equations for any number 
of predictors. 1: r S S. folds for any b term, then it holds for j=l, j=2, j=3,..., 

yj ' j j 

j=p. For example, assume j=2 predictors. He know that the set of normal equations will consist 
2 

of i x i = 2 x 2 = T - h terms. Thus, first write out the general result for r ,S S, twice as follows: 

' yj y j 

r-.S S, = b.r S S, + b.r S S, 

yj y j 1 ij l j 1 1} 2 j 



r ;S S; = b;r:;S:S; I b„r.;S.S; 
yj y J 1 Ij 1 J 2 2 3 2 3 



Now, substitute the appropriate j value: j=l for line 1, and j.=2 for line 2j 



2 



r c c = HrS + h r S S r ,S S, b.S. + b„r S S 

r yiyl Yiril 2 212 1 yi y 1 11 2 21 2 1 

OR - - ■ ■ "2 

r v2 S v S 2 - Vl2 S . S 2 + V 2 2 S 22 » ^V* * VilV» + " ** 



OR 



y y s i " ¥i + V::V 2 

r y2 S y S 2 ' Wfi + 



The last set shows the subscripts of the correlations between predictors and criterion, and 
the predictor standard deviations written so that the first subscript is less than the second 
subscript. As mentioned in the text, this convention makes it easier to read the matrix ( and 
see the symmetry of off diagonal terms). 



Example for Five Predictors 

To exeiii'iify the procedures for" p predictors, we will work through the solution of normal 
equations for five predictors. We will show the solution by the short-cut method. 
The long method could he used by applying the steps listed above for any b. term, but s'ice 



the shorter method gives identical results, we will not work through the longer method. |J 

2 

Me begin by writing out the 5=25 terms for the general r ,S S normal equation. That is, 

yj y j 

write out r .S S, on five separate lines. 

yj y j - 

r ,S S. = hrJS, + b o r 0 .S,S. + b„r v S„S. + b.r S S, + b r S S 

yj y j 1 lj 1 j 2 2j 2 J 3 3j 3 j 4 4 : i 4 3 5 5j 5 j 

r S S = hr, .S,S. + b„r 0j S S. + b,r ,,.S 0 S. + b.r. .S S. + b,r S S. 

yj y j 1 lj 1 j 2 2j 2 j 3 3j 3 j 4 4j 4 j 5 5j 5 j 

r S S = b t r,.S,S. + b o r 0 .S 0 S. + bJ S S, + b.r S S. + b.r. S S 

•yj y j 1 lj 1 j 2 2j 2 j 3 3j 3 3 4 4j 4 j 5 5j 5.J 

r.SS, = b,r, ; S:S ; + b 0 r,..S o S. + b_r v S,S. + b.r S S + b r S S 

yj y j 1 lj 1 j 2 2j 2 j 3 3j 3 j 4 4j 4 j 5 5j 5 j 

r. ;S.S; = !,,r r S,S, + b 0 r„.S.S. + b_r„,S_S. + b.r S S + b r S S 

yj y j 1 lj 1 j 2 2j 2 j 3 3j 3 j 4 4j 4 j ■ 5 5j 5 j 

Substitute the .appropriate j value (j=i for line 1, j=2 for line 2, etc.)* set r^= 1 ,^=2, etc- 

2 2 

and set S ^ ^=S ^ , = ^ etc. 

tyl S yS| =■ b^. + b^Sft + b 3 - 31 S 3 S 1 + bftftSj + b.r^.E.Sj 

r /y S 2 = . b i r 12 S l S 2 + B 2 S 2 + b 3 r 32 S 3 S 2 + V42V2' + VbM 

Wl ' ! Vir: S 2 * b 2 r 23 S 2 S 3 + b 3 S 3 + Vtfft + Vs^l 



2 



ER£ = b i r B S . S 5 + b 2 r 25 S 2 S 5 + b 3 r : '5 + Ws + b 5 S 5 

. 61) - 



If one desires, the subscripts may be reversed for variables in the upper right hand 

triangle to render the first less than the second. The result is the same set of 

normal equations that would be obtained if the longer method were used to derive the 
; 1 

no.mal equations. 



P" ' 

The au;iior would be pleased to receive comment and reactions by readers of this paper 
and others that appear in this series. My intention is to prepare a textbook of 
proofs and derivations for social science students. I have long felt the 
need to bridge the gap between the standard applied statistics (and psychometric) textbooks 
currently on the market and mathematical .statistics. The mathematical sophistication 
of students entering college and university is rising steadily, and a textbook such I am 
contemplating would make a contribution, T feel. While it is true that a "real" 
understanding of statistical (and probability) theory requires substantial mathematical 
coursework, it is nonetheless true that more in the way of explanation and justification 
of results in probability and statistics is possible. It is my belief that a textbook 
showing detailed presentations of proofs/derivations would be a welcome addition to the market 

I would like to hear from readers (students, professors and others) regarding these 
papers. For example, are they clear? Are there proofs that you would like to 
see (statistics or psy eh ome tries) in this format? Please remember, at this time I 
am limiting my selections to those which can be presented with algebra. 

I welcome comments on any level from readers of these papers. My mailing address is: 

Francis J. O'Brien, Jr. 

106 Morningside Drive , Acartment # 5 

New York, New York 10027 
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Appendix B 



ERRATA for " A derivation of the sample multiple Correlation formula for standard scores 11 ED 223 429 



eorrect to 



2, Mi vat-ion- far--Twa PMktfiK 



3, footnote 



21, first formula 



27, statement under Plan 

27, two lines under previous 
erratum 

31, line 2 

33, 4 sentences from bottom 



Let us review some 
concepts; notation ;;; 

4f it is understood that the ail 
summations range from to 



let us review some of the 
concepts , notation ... 

1 



If it is understood that -aUUife 
suitimations range from i-1 to 
i=n,then we can drop the summation i-ii; then we can drop the summatiai 
limits. all together; limits gj together- ; 

covlZ^BjZ^B^^B^,..., B.Z,, cov(L;BjZj + IX +^Z- +;.;+ 

, , » .B Z , B.Z; t; , ;+ B Z ) \ 

P P 3 3 P P 

corr(Z Y) B 1 Z 1 ,B 2 Z 2 ,B 3 Z 3 ,... ) B.Z, ) ... corr(z ^ B ^ + y + B . z . +...+ 



B-Z-) 
P P 



demonstrate 



consisdered 
First first 



B.Z. +...'+ B Z ) 
] J \ PP. 

add period, D A 
demonstrate 

.considered 
first 



Corrected prose is underlined, bat formulas are rewritten with applied corrections only. 
NOTE: "Page" refers to original numbers in upper right hand corner. 
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